Balancing between over-weighting and under-weighting in supervised term weighting

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Balancing between over-weighting and under-weighting in supervised term weighting

Supervised term weighting could improve the performance of text categorization. A way proven to be effective is to give more weight to terms with more imbalanced distributions across categories. This paper shows that supervised term weighting should not just assign large weights to imbalanced terms, but should also control the trade-off between over-weighting and under-weighting. Overweighting,...

متن کامل

Reducing Over-Weighting in Supervised Term Weighting for Sentiment Analysis

Recently the research on supervised term weighting has attracted growing attention in the field of Traditional Text Categorization (TTC) and Sentiment Analysis (SA). Despite their impressive achievements, we show that existing methods more or less suffer from the problem of over-weighting. Overlooked by prior studies, over-weighting is a new concept proposed in this paper. To address this probl...

متن کامل

Supervised Term Weighting Methods for URL Classification

Many term weighting methods are suggested in the literature for Information Retrieval and Text Categorization. Term weighting method, a part of feature selection process is not yet explored for URL classification problem. We classify a web page using its URL alone without fetching its content and hence URL based classification is faster than other methods. In this study, we investigate the use ...

متن کامل

Empirical Term Weighting

Our system used an empirical method for estimating term weights directly from relevance judgements, avoiding various standard but potentially troublesome assumptions. It is common to assume, for example, that weights vary with term frequency ( ) and inverse document frequency ( ) in a particular way, e.g., , but the fact that there are so many variants of this formula in the literature suggests...

متن کامل

Relevance Weighting Using Distance Between Term Occurrences

Recent work has achieved promising retrieval performance using distance between term occurrences as a primary estimator of document relevance. A major bene t of this approach is that relevance scoring does not rely on collection frequency statistics. A theoretical framework for lexical spans is now proposed which encompasses these approaches and suggests a number of important directions for fut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Processing & Management

سال: 2017

ISSN: 0306-4573

DOI: 10.1016/j.ipm.2016.10.003